Introduction

The Federal National Mortgage Association(FNMA), popularly called Fannie Mae is one of the largest players in the mortgage and mortgage backed securities in the USA. In the years leading up to 2008, Fannie Mae suffered severe losses due to high volume of loan defaults. The loan performance data from Fannie Mae serves as an important resource for businesses and investors to understand the factors that lead to loan defaults.

In this report, I have analyzed some features that are potentially significant factors that could lead to a loan being defaulted by the borrower. Specifically, I have examined these features from a lender’s perspective. What are some key metrics should banks or financial organizations look for when giving out money on loan? What can be said about credit scores, income and other factors of borrowers and how are they correlated to loan defaults? Let’s examine in the following report.

Loan to Value Ratio analysis

In simple terms, the loan to value(LTV) ratio is a measure of how ‘risky’ you are as a borrower. LTV is calculated as the ratio of amount borrowed to the value of the property on mortgage. Higher LTV ratios indicate more chances of defaulting on the loan.

To understand the distribution of LTV in the fourth quarter of years 2007 and 2019, let’s look at histograms in Figure 1 and Figure 2.

From the histogram for 2007, it is seen that most borrowers had Loan to value ratios in the range 70 to 80, which is typically a high number. The distribution is right skewed and has a median at 77. Similarly, for 2019, the histogram is also right skewed with median at 75. The distribution for 2007 is significantly less number of borrowers with LTV in other lower ranges, while in the histogram of 2019, we can see that there are significant number of borrowers in each LTV bin. Thus, it is clear that in 2007, loans had a higher chance of defaults than in 2019.

Loan Purpose and Default Rates

Next, we examine the loan purpose and analyze the purpose category which has most defaults.

In the chart Figure ??, the loans for the purpose of ‘refinance’ has the highest rate of defaults in 2007. This trend is however different in 2019, with loans for ‘purchase’ have the highest default rates. However, the thing to note is that the number of defaults is significantly lower in 2019 as compared to 2007. This shows that the economic conditions have stabilized over the years from 2007 to 2019.

Credit score Analysis

Since credit score is a key factor when getting a loan, I have grouped the credit scores of borrowers into 5 categories - ‘Poor’, ‘Fair’, ‘Good’,‘Very Good’, ‘Exceptional’. For each category, I have calculated default rates to show that those with lower credit scores are more likely to default on loans. This is shown in Figure 3.

Debt to Income Ratio Analysis

Debt to Income(DTI) ratio is a metric that compares your monthly debt payments to your monthly income. A lower DTI is an indicator of a better balance between debt and income. Hence, lenders prefer those borrowers who have a lower DTI since they are less likely to default.

To understand distribution of the DTI in the years 2007 and 2019, I have created two boxplots in Figure 4 . We can see that the DTI in 2007 was much more spread out over a large range of values, with a median at 39. But the distribution of DTI in 2019 has much less variance and a median at 36.

Conclusion

From the analysis above, we found that economic conditions have become much more stabilized from 2007 to 2019. Borrowers have shown very little default rates in 2019 compared to that in 2007. We found that the factors like Loan to Value ratio, debt to income ratio, credit score are important indicators for banks to take into account when giving out loans, since these factors are directly related to default of loans.

(Word count: 668 words)

Figure Appendix

Histogram of Loan to value 2007

Figure 1: Histogram of Loan to value 2007


Histogram of Loan to value 2019

Figure 2: Histogram of Loan to value 2019

Purpose vs loan default

(#fig:purpose-bar )Purpose vs loan default

## <ggproto object: Class ScaleDiscrete, Scale, gg>
##     aesthetics: fill
##     axis_order: function
##     break_info: function
##     break_positions: function
##     breaks: waiver
##     call: call
##     clone: function
##     dimension: function
##     drop: TRUE
##     expand: waiver
##     get_breaks: function
##     get_breaks_minor: function
##     get_labels: function
##     get_limits: function
##     get_transformation: function
##     guide: legend
##     is_discrete: function
##     is_empty: function
##     labels: waiver
##     limits: NULL
##     make_sec_title: function
##     make_title: function
##     map: function
##     map_df: function
##     n.breaks.cache: NULL
##     na.translate: TRUE
##     na.value: NA
##     name: 
##     palette: function
##     palette.cache: NULL
##     position: left
##     range: environment
##     rescale: function
##     reset: function
##     train: function
##     train_df: function
##     transform: function
##     transform_df: function
##     super:  <ggproto object: Class ScaleDiscrete, Scale, gg>

Figure 3: Credit score by category

DTI distribution of defaulters

Figure 4: DTI distribution of defaulters

## <ggproto object: Class ScaleDiscrete, Scale, gg>
##     aesthetics: colour
##     axis_order: function
##     break_info: function
##     break_positions: function
##     breaks: waiver
##     call: call
##     clone: function
##     dimension: function
##     drop: TRUE
##     expand: waiver
##     get_breaks: function
##     get_breaks_minor: function
##     get_labels: function
##     get_limits: function
##     get_transformation: function
##     guide: legend
##     is_discrete: function
##     is_empty: function
##     labels: waiver
##     limits: NULL
##     make_sec_title: function
##     make_title: function
##     map: function
##     map_df: function
##     n.breaks.cache: NULL
##     na.translate: TRUE
##     na.value: NA
##     name: waiver
##     palette: function
##     palette.cache: NULL
##     position: left
##     range: environment
##     rescale: function
##     reset: function
##     train: function
##     train_df: function
##     transform: function
##     transform_df: function
##     super:  <ggproto object: Class ScaleDiscrete, Scale, gg>